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Abstract — This paper extends some geometric properties of 
a one-parameter family of relative entropies. These arise as 
redundancies when cumulants of compressed lengths are consid- 
ered instead of expected compressed lengths. These parametric 
relative entropies are a generalization of the KuUback-Leibler 
divergence. They satisfy the Pythagorean property and behave 
like squared distances. This property, which was known for finite 
alphabet spaces, is now extended for general measure spaces. 
Existence of projections onto convex and certain closed sets is 
also established. Our results may have applications in the Renyi 
entropy maximization rule of statistical physics. 

I. Introduction 

Relative entropy or KuUback-Leibler divergence I{P\\Q) 
between two probability measures is a fundamental quantity 
that arises in a variety of situations in probability, statistics, 
and information theory. It serves as a measure of dissimilarity 
or divergence between two probability measures P and Q on 
a given measure space. In information theory, it is well known 
that I{P\\Q) is the penalty in expected compressed length, i.e., 
its gap from Shannon entropy H{P), when the compressor 
assumes that the (finite-alphabet) source probabiUty measure 
is Q instead of the true probability measure P. 

Renyi entropies Ha{P) for a G (0, oo) play the role of 
Shannon entropy when the normalized cumulant of compres- 
sion length is considered instead of expected compression 
length. Indeed, Campbell 1 1 ] showed that 

min — logE [exp{pi„(X")}] Ha{P) (as n oo) 
np 

for an independent and identically distributed (iid) source with 
marginal P. The minimum is over all compression strategies 
that satisfy the Kraft inequality, a = l/{l+p), and p > is the 
cumulant parameter We also have lima^i Ha{P) = H{P), 
so that Renyi entropy may be viewed as a generalization of 
Shannon entropy. 

If the compressor assumed that the true probability measure 
had marginal Q, instead of P, then the gap in the normalized 
cumulant's growth exponent from the optimal value (Renyi 
entropy) is an analogous parametric divergence quantity (in- 
troduced by Blumer and McEliece [2 1 and studied further by 
Sundaresan |3|), which we shall denote Ia{P,Q). The same 
quantity also arises when we study the gap from optimality 
of mismatched guessing exponents (see Arikan |4| as well 



as Sundaresan ||3|). All these results are applicable to more 
general non-iid sources. 

As one might expect, it is known that (see for example, 
Johnson and Vignat |5, A.l]) lim„^i Q) = I{P\\Q), 
so that we may think of relative entropy as Ii{P,Q), and 
therefore as a generalization of relative entropy, i.e., an 
a-relative entropy. Furthermore, for probability measures on a 
finite alphabet set, la behaves like squared Euclidean distance, 
and satisfies a "Pythagorean property" [3 1 like relative entropy 
and squared Euclidean distance. One purpose of this paper is 
to extend this property to probability measures on a general 
measure space with some common dominating measure. 

The maximum entropy principle is a well-known selection 
rule, in the presence of uncertainty, in statistics. For a source 
alphabet X with finite cardinality, by noting that I{P\\U) = 
log |X| — H{P) with U taken as the uniform measure on 
the finite alphabet set X, the maximum entropy principle is 
the same as the minimum relative entropy principle, an idea 
that goes back to Boltzmann, and one which is supported by 
the theory of large deviations. Indeed, suppose that certain 
ensemble average measurements can be made on a realization 
of a sequence of iid random variables (mean, second moment, 
etc.). The resulting realization must have an empirical measure 
that obeys the constraints placed by the observations. In 
particular, the empirical measure belongs to a convex (and 
possibly closed) set. Large deviations theory tells us that, 
amongst the measures that respect the constraints, the one that 
minimizes relative entropy is exponentially more likely than 
the others. The resulting measure is called /-projection and 
was extensively studied by Csiszar fSl, f?!, and more recently 
by Csiszar and Matiis |8|. /-minimization arises similarly in 
the contraction principle of large deviations theory (see for 
example Dembo and Zeitouni's p. 126]). 

As a natural alternative selection principle, the maximum 
Renyi entropy principle has been recently considered. This 
principle is equivalent to maximizing the Tsallis entropy, 
which is a monotone function of the Renyi entropy. See 
for example Jizba and Arimitsu ifTOl . and references therein. 
More interestingly, Jizba and Arimitsu fTO| indicate that max- 
imum Renyi entropy principle may be viewed as a maximum 
Shannon entropy principle with multifractal constraints. This 
selection principle has been of recent interest in statistical 



physics settings because Renyi entropy maximizers under 
a covariance constraint are distributions with a power-law 
decay (when a > 1). See Costa et al. [li] or Johnson 
and Vignat ISj. Several empirical observations in naturally 
arising physical and socio-economic systems possess a power- 
law decay. Without going into these aspects, we remark that 
Ia{P,U) = log |X| — iJa(P), SO that both the maximum Renyi 
entropy principle and the maximum Tsallis entropy principle 
are equivalent to a minimum a-relative entropy (minimum /„) 
principle. Thus one needs to find amongst empirical measures 
that meet the observation constraints, the one that minimizes 
la- We shall call this the /q, -projection. While existence and 
uniqueness of /q -projection was proved by Sundaresan (|3] for 
the finite alphabet case, the second purpose of this paper is to 
extend these results to more general measure spaces. 

It is known (see for example |[3|) that Ia{P, Q) is the more 
commonly studied Renyi divergence of order 1 /a, not of the 
original measures P and Q, but of their tilts P' and Q', 
where P'{x) = P{x)°' /Z{P), and Z{P) is the normalization 
that makes P' a probability measure. Q' is similarly defined. 
While the Renyi divergences arise naturally in hypothesis 
testing problems (see for example Csiszar ||T21 ). 1^ arises more 
naturally as a redundancy for mismatched compression. 

la is also a certain monotone function of Csiszar's /- 
divergence between P' and Q'. As a consequence of the 
appearance of the tilts, the data-processing property satisfied 
by /-divergences does not hold for the a-relative entropy. 
Surprisingly though, the Pythagorean property holds. 

The rest of the paper is organized as follows. In section |ll] 
we provide the definitions and demonstrate the existence of 
la projections on certain closed and convex sets. In section 
Unl we extend the Pythagorean property to general measure 
spaces (with a common dominating measure), and identify the 
consequences with respect to iterated projections. In section 
IIVI we summarize our results. 

II. /a-PROJECTION 

We first formalize the definition of a-relative entropy to a 
general probability space. 

Let P and Q be two probability measures on a measure 
space (X, A"). Let a G (0, cxa) with a ^ 1. By setting a — 
1/(1 + p) we have the reparameterization in terms of p with 
— 1 < p < CX3 and p ^ Q. Let /x be a dominating cr-finite 
measure on (X, X) with respect to which P and Q are both 
absolutely continuous, denoted P fj, and Q <C /i. We denote 
p = dP/dp, and q — dQ/dp and assume that they are in the 
complete metric space L"{p) with metric 

min{l, 1 /a} 

d{f,9)= { I If-g^dp' 



l/c 



We shall use the notation 



\frdp 



entropy of P of order a (with respect to p) is given by 

Ha{p) = j^iog (^j p'^dpy 

Consider the tilted measures P' and Q' given by 

dP' , p" ^ dQ' , 

—— = p := — and — — ^ q := — . 

dp J p"dp dp J q^dp 

P' and Q' are also dominated by p. With 

/(x) sgn(p) • x^+P, 

Csiszar's /-divergence |13 | between two measures P and Q, 
both absolutely continuous with respect to p, is given by 



9/(^1 dp. 



even though it is not a norm for a < 1. (The dependence 
of this quantity on a should be borne in mind). The Renyi 



Since / is strictly convex when p ^ 0, by Jensen's inequality. 
If {P. Q) > /(I) with equality if and only if P = Q. 
We now define the a-relative entropy to be 

WQ) := ^log [sgn(p).//(P',g')]. 

Abusing notation a little, when speaking of densities, we shall 
some times write Ia{p,q) for Ia{P,Q)- 

We now summarize the anticipated properties of a-relative 
entropy. 

Lemma 1: The following properties hold. 

1) Ia{P, Q) >0 with equahty if and only if P = Q. 

2) Under certain regularity conditions, lim^^i /q(P. Q) = 

liPWQ)- 

3) Let X = K" and let p be the Lebesgue measure on 
M". For a > n/{n + 2) and a 7^ 1, define the constant = 
(1 — a)/(2a — n(l — a)). With C a positive definite covariance 
matrix, the function 

ga..c{x) - Z-^ [1 + 6a • x'^C-^x]f^ , 

with [a]_|_ := max{a, 0} and the normaUzation constant, 
is the density function of a probability measure on whose 
covariance matrix is C. Furthermore, if g is the density 
function of any other random variable with covariance matrix 
C, then 

Ia{g,ga.,c) ^ Ha{ga,c) - Ha{g)- (1) 

Consequently g^^c is the density function of the Renyi en- 
tropy maximizer among all M" -valued random vectors with 
covariance matrix C. 

4) Let |X| < CXI and let U be the uniform probability mass 
function on X. Then /^(P, J7) = log |X| - Ha{P). □ 

Proof: We only give an outline here. Statement 1) follows 
by an application of Holder's inequality by considering the 
Holder conjugates a and a/(a — 1), and the functions p/||p| 
and (q/llgll)"^^. Statement 2) follows by an application of 
L'Hopital's rule and some conditions that enable interchange 
of differentiation with respect to the parameter a and integra- 
tion with respect to p. Statement 3) was proved by Lutwak 
et al. llT4l . See also Johnson and Vignat Q. For relative 



entropy, the analog of ([T]i under a covariance constraint would 
be I{g\\4i) = H{(t)) — H{g), where H is differential entropy 
and (j) is the Gaussian distribution with the same covariance 
as g. The last statement follows from the definition. ■ 

We next prove an inequality relating /-divergences. This 
yields parallelogram identity for relative entropy {a = 1) fS). 

Lemma 2: Let a < 1. Let Pi,P2,R be probability mea- 
sures that are absolutely continuous with /i, and let the 
corresponding Radon-Nikodym derivatives pi,p2, and r be in 
L"{fi). Assume < A < 1. We then have 

X[If{Pi,R') /(I)] + (1 - X)[If{P;,R') /(I)] 

- \[if{pi,R[,2) - /(I)] - (1 - mfiPLK2) - /(I)] 

> (2) 



where 



IIWII + l|P2|| 



Rl,2 = 



l-A 



(3) 



When a > 1, the reversed inequality holds in (|2). □ 
Proof: We briefly outline the steps. Let ri^2 = dRi,2/dfi. 
Observe that since //(•, •) > /(I), a consequence of Jensen's 
inequality indicated earlier, all terms within square brackets 
are nonnegative. The left-hand side of inequality can be 
expanded to 



sgn(/3) 



\\Pi 
+sgn(p) 

sgn(p) 



ri.2 
\\ri.2\\ 



dfi 



(1 - A)P2 

Ib2|| 

ri.2 

\\riM\ 
1 - 



r 

R 



ri.2 
\\ri.2\\ 



ri.2 



X 



\\Pi\ 



X 



1 



IIP2 

- A" 



''1,211 

ri,2\\ 

lkM||-[//(i?'l,2'^V/(l)]- 



dfi 



.Ibill \\P2\ 

Applying Minkowski's inequality in ([3]) with a < 1, we get 

A l-A' , 

ri.2 > 1. 



.bill \\P2\ 

This inequality gets reversed when a > 1, again by a version 
of Minkowski's inequahty. Since If{R[ 21 -R') — /(I) > 0, the 
lemma follows. ■ 

Let us define what we mean by an /„ -projection. 

Definition 3: If is a set of probability measures on (X, X) 
such that Ia{P, R) < 00 for some P ^ E, a measure Q E E 
satisfying 



Io,{Q,R)^ mi^I^iP,R) 



(4) 



is called the /^-projection of R on E. □ 
Let £' be a set of probability measures on (X, X). Let ji be 
a common (cr-finite) dominating measure for E. Write 

S^L^'-^-.PeE 



and assume that £ C L°'{ii). Now define 



P 



We are now ready to state our main result on the existence 
of /ct-projection. 

Theorem 4: Let a 6 (0, 00) and a ^ 1. Let be a set of 
probability measures with dominating cr-finite measure /i such 
that the subset of functions £ is convex and closed in 
Let i? be a probability measure and suppose that (P, R) < 
00 for some P ^ E. Then R has an /^-projection on E. □ 

Remark 1: The closure of £ in L"(/i), for a = 1, would 
be closure in the total variation metric, which is one of 
the hypotheses in Csiszar's |6 Th.2.1]. The proof ideas are 
different for the two cases a < 1 and a > 1. The proof for 
a < 1 is a modification of Csiszar's approach in [6 1. The proof 
for a > 1 exploits properties of sets that are convex and closed 
under the weak topology. We are indebted to Pietro Majer 
for suggesting some key steps on the mathoverflow.net 
forum. 

Proof: (a) We first consider the case a < 1. Pick a 
sequence Pn G E such that If{P!^,R') < 00 and 

If{P:,,R')^ mi^Ij{P',R'). (5) 

By Lemma (O, we have 

A//(P^,/?') + (l-A)//(P;,/?') 

- XIf{P^, R'ra,n) ^ (1 ^ ^)If{PL R'm.n) 

> [//(p;.,„,p')-i] 

where 

A^ + fl-A)^ 



(6) 



(1 



Rn 



A),-" 



l-A 



e E 



on account of the convexity of E. Rearranging (|6) and using 
//(•,•)> /(I) = 1. we get 

1 < A//(P^,P;,,„) + (1-A)/;(P;,P;, J 

< A//(P;„P') + (1-A)//(P;,P') 

Take the limit as m, n ^ oo. The expression on the right-most 
side is at most 1 because If{P^, R') and If{Pn, R') approach 
the infimum value, and If{R'^ w^') at least this infimum 
value for each m and n. Since we also have If (P^ , P^ «) — 1 
and If{P'n,R'm.n) > 1' it follows that 

lim [//(p;„,p;„,„)-i] =0. 

From ifTJl Th. 1], a generalization of Pinsker's inequality, we 
get that the total variation metric, denoted |P — Q\, is small 
if //(P, Q) — 1 is small. This fact and the above limit imply 
that 

lim ip;„-p:„,„i = o, 

m,n— >oo ' 

which, together with the triangle inequality for the total 
variation metric, yields 



IP' -P' I < IP' -R' 



IP' -R' 



as m, n 



i.e., the sequence {p'„} is a Cauchy sequence in It must 

thus converge to some g in L^{n), i.e., 



lim 



Pn 
\\Pn\ 



9 



dii = 0. 



(7) 



There is then a subsequence, over which one gets a.e.[/i] 
convergence. Reindexing to operate on this subsequence, we 
get 

Pn 



\\Pn 



9 a.e.[/x]. 



We will now demonstrate that an -projection, say Q, is in 
E and has /i-density proportional to g^^°'. 

In view of the a.e.[/i] convergence, and after observing that 



Pn 



1/a 



\\Pn\ 



< 2" 



Pn 



\\Pn\ 



we can apply the generalized Dominated Convergence Theo- 
rem Ids] Ch.2, Problem.20] to get 

Pn 



1/a 



9 



We next claim that 



IK I 



1/a 



in L°'{^i). 



\\pn\\ is bounded. 



(8) 



Suppose not; then working on a subsequence if needed, we 
have \\pn\\ '■= Mn oo. As j pndfi = 1, given any e > 0, 

1 



as n 



oo, 



and hence — J> in [/i] -measure, which would be a contra- 
diction to the fact that / gndfi — 1 for all n. Thus ^ holds, 
and so we can find a subsequence that converges to some c. 
Reindex and work on this subsequence to get p„ — > cg^/" in 
L"(/Lt). Since £ is closed in we obtain cg^^" ^ q for 

some q e £, c= \\q\\, and g = g"/||g||" £ £' . Let Q be the 
probability measure in E with dQ/dfi — q. 

To complete the proof, we need to demonstrate that 
IaiP',R') > Ia{Q',R') for every P <E E.To see this, note 
that O implies that p'^ q' in and by a change 

of measure, p'^/r' — > q' /r' in L^{R'), and hence in [R']- 
measure. But / is continuous, and so f {p'n/r') f{q'/r') 
in [i?'] -measure. Fatou's lemma then implies 



If{Q\R') < \iuiMIf{K,R') = inf //(P',i?') 



(9) 



Since Q ^ E, equality must hold, and Q is an /^-projection 
of R on E. This completes the proof for the case when a < 1. 

(b) We next consider the case when a > 1. Note that p is 
negative, and so the inf in (|4|i becomes a sup as follows. The 
/(i-projection Q must satisfy (|4|i which can be rewritten as 



Ia{Q,R) 



-log 

P 



1 



■ loe 



sup 

pes 



sup 

hei 



a-l 



hg dii 



(10) 



where 



£ <! s-f— ■.pe£,0<s<l 

Ibll 



and g ^ {r/\\r\\)°' , an element of the dual space 
We now claim that 

f is a closed and convex subset of L"(/i). (11) 

Assume the claim. Since L" (/i) is a reflexive space, the closed 
and convex set £ is closed under the weak topology. Since £ 
is also contained in the unit sphere in L°'{p,), the unit sphere 
being compact in the weak topology in a reflexive space, £ 
must be compact in the weak topology. The supremum is thus 
of a bounded linear functional over the weakly compact set 
£. It is therefore attained in £. Since the linear functional 
increases with s, the supremum is attained with s = 1. Thus 
the supremum in (fTol i over p G £ is attained. 

We now proceed to show the claim ( fTTT i. To see convexity, 
let pi,p2 ^ £ and < si, S2, A < 1. Then 



Asi 



Pi 

\\Pi\ 



(1-A)S2 



=: s 



A.S1 

IIPilI 
P 

\\P\\ 



\\P2 



where 



P 



P2II 



P2 



llplll 



(1-A)^2 

Ib2|| 



e £ 



by the convexity of £. From Minkowski's inequality (for a > 
1), we also have 



Asi , (1-A)s2 



iPi 



\\P2 



\\p\\ < Asi + (1 - A).S2 < 1, 



and this establishes the convexity of £. 

To see that £ is closed in L°'{fi), let {g„} C f be a Cauchy 
sequence in L°'{p,). Then g„ = s„p„/||p„||, with pn E £ and 
< Sn < 1, converges to some g in L°'{ijl). By taking norms, 
we see that \\gn\\ = Sn — > II.9II < 1. If 5 = a.e.[/x], then 
g € £ hy taking s = 0, and we are done. Otherwise we can 
assume that \\gn\\ > for all n by focusing on a subsequence 
if needed, and that H^H > 0. We can thus conclude that 
Pn/\\Pn\\ = .9n/||gn|| 9 / hW i" -^"(m)- Sincc £/ ^ 0, the 
same argument that showed ([8]l shows that l|p„l| is bounded, 
and by focusing on a subsequence, we may assume that it 
converges to some constant c. Hence p„ — >■ cg/HgH in L"(/i). 
Since £ is closed, we must have cg/H^H = p for some p E £, 
c = \\p\\, and g = Since we already established 

that ||g|| < 1, it follows that g G £. This completes the proof. 

■ 

We close this section with a result on the continuity or the 
lower semicontinuity of a-relative entropy. 

Proposition 5: For a fixed q, consider p ^ la {p, q) as a 
function on L"{iJ,). This function is continuous for a > 1 and 
lower semicontinuous for a < 1. □ 



Proof: Let us first consider the case when a > 1. Let 
p„ p m L"(At). Then ||p„I| \\p\\ and so p„/||p„|| 
p/WpW in L°'{fj,). As mentioned in the proof of Theorem |4lb), 
la {p, q) is a monotone function of a bounded linear functional 
in Hence Ia{p, q) is continuous in p. For a < l{p > 0) 

we write 

1 



ia{p,q) = - log 



{p'/q'Y+PdQ' 



hstpn — ?>pini"(^). Then ||p„|| -> ||p|| and since < 
|Pn|" + |p|", the generalized Dominated Convergence Theorem 
yields 

{Pn/\\Pn\\r ^ {p/\\p\\r in L\t,), 

i.e., p'j^ — !> p' in L^{n)- This is the same as saying p'^/q' 
p'/g' in L^{Q'), and thus in [Q'] -measure. Hence it follows 
that {piJq'y^'' [p' Iq'Y^^ in [Q'] -measure. By Fatou's 
lemma. 



lim inf 

n— )-oo 



j {p^qT^'dQ' > J [p'/q'f+PdQ'. 



As increasing function of a lower semicontinuous function is 
lower semicontinuous, the result is established for a <1. ■ 

III. Pythagorean property 

In this section, we state the Pythagorean property for a- 
relative entropy. We define the -sphere with center R and 
radius r as S{R, r) = {P : Ia{P, R) <r}, < r < oo. 

Theorem 6: Let a > and a ^ 1. Let ^ be a common 
dominating cr-finite measure. 

1) If Ia{P, R) and Ia{Q, R) are finite, "the segment joining 
P and Q" does not intersect the /^-sphere B{R, r) with 
radius r = Ia{Q,R), i.e., Ia{P\,R) > Ia{Q,R) for 

Pa - AP + (1 - A)Q, A e [0, 1] 

if and only if 

/a(P,i?) >/a(P,Q)+/a(Q,P). 

2) If 



(12) 



= AP+(1-A)S', 0<A<1 



(13) 



and Ia{Q,R) is finite, then the segment joining P and 
S does not intersect B{R, r) with r — Ia{Q, R), if and 
only if J„(P, R) = Ia{P, Q)+Ia{Q, R) and R) = 
Ia{S,Q) + Ia{Q,R). □ 
For the proof(see Appendix), we proceed as in [3 | where 
it is proved for the finite alphabet case, with appropriate 
functional analytic justifications for the general alphabet case. 

Once Theorem |6] is established in generality, the proofs of 
the following results are exactly as in |[3]. 
Theorem 7: The following statements hold. 

1) {Projection) A Q E C\ S{R, oo) is an /^-projection of 
R on the convex set E iff every P E E satisfies (fT2] l. If the /„- 
projection is an algebraic inner point of E then E C S{R, oo) 
and ( fT2b holds with equality. 

2) {Uniqueness of la-projection) If /^-projection exists, it 
is unique. 



3) {Iterative projection) Let E and /?i C /? be convex sets 
of probability measures, let R have /„ -projection Q on E and 
Qi on /?i, and suppose that ( fT2] i holds with equality for every 
P E E. Then Qi is the /„ -projection of Q on /i'l. □ 

IV. Summary 

We studied a parametric extension of relative entropy /„ for 
a > and a ^ 1. These arose naturally as redundancies under 
mismatched compression and when normalized cumulants of 
compression lengths are considered (0 < a < 1). We first 
studied /„ minimization problems and showed that projections 
exist on convex and closed sets (in L"{fj,)) when the sets 
are dominated by a cr-finite measure /i. We then extended 
the Pythagorean property to general measure spaces. As a 
consequence, one also gets an iterated projections property. 
Axiomatic characterizations that lead to /„ minimization and 
Renyi entropy maximization are currently under investigation. 
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Appendix A 
Proof of Theorem[6] 

[T]i We first prove statement [T]i. We begin with the "only if" 
part. Under the hypothesis, it suffices to show that 

sgn(p) • If{P', R') > sgn(p) • If{P', Q') ■ If{Q', R'). 

Now, 



= sgn(p) . j {p'^+^irT'df, 



Ibll 

Therefore it suffices to show that 
sgn(p) / p {r'yPdfi 



> !^ I p ^q'yPdf, . / q (rT'dfi (14) 



1911 



Now 



ifiPLR') 



sgn(p) 

IbAll 

1(A) 
tiX) 



where 



s(A) sgn(p) / px ■ {r'^dfi and t{X) ||pa||. 



Clearly, Ia{P\,R) > Ia{Q,R) for A G (0,1) implies that 
If{P;^,R')~If{P^,R') 



A 



> for AG (0,1). 



(15) 



Therefore the limiting value as A J, 0, the derivative of 
If{P'^,R') with respect to A evaluated at A = 0, should be 
> 0. We then have 



^(A)-^(O) 
A 



sgn(/3) 
A 

sgn(p) 



pxirT'dfi- I q {r'r'dii 



A 



= sgn(p) (p-q) (r') Pd^i 



sgn(p) 



p {r'yPdfi - / q {r'yPdfi 



So s(0) = limA4.o(s(A) — s(0))/A exists and equals the above 
expression. For a > 1, we have 



.a\p-q\ipxr-' <aip + qr 
while for a < 1, we have 



Oi {p + q)" 



{miii(A,l - A)} 



1 — a ' 



and both upper bounds are in L^ifJ.) for a fixed A > 0. 
Therefore by chain rule and 11151 Th. 2.27], we get 



i{X) 



{pxrd^, 



{pxr-\p - q)di. 



for each A > 0. Taking A J, 0, we get 



q°'dn 



Jq^dfl 
p-iq'ydfl 



q" \p-q)d^j. 

pq'^'^dn — J q°'dii 
dfi — [ / q^dfi 



Thus 
1 
A 



f(A) _£(0) 

t{x) m 

1 



<(A)i(0) 



m 



■<A) - .(0) 
A 



.(0) 



^(A)-t(O) 
A 



It follows that the derivative of s(A)/t(A) exists at A = and 
is given by (t(0)s(0)-s(0)t(0))/<2(0). Equation O together 
with t(0) > imply that 



i(o)-.(o).||>o. 



(16) 



Consequently, i(0) is necessarily finite. Substituting the val- 
ues of s(0), s(0), t(0) and i{0) in (fTSI l we get the required 
inequality ( fl4] i. 

To prove the converse "if" part, let us assume that 

IaiP,R)>IaiP,Q)+IcAQ,R), 

which is the same as ( fT4] l. It also implies that Ia{P,Q) is 
also finite. From the trivial statement Ia{Q, R) — Ia{Q, Q) + 
Ia{Q,R}, we have 



sgn(p) J q (/) Pdfi 
sgn(p) 



Ikll 



q{q')-'d^i- q{r')-'d^i. (17) 



A A-weighted linear combination of (fT4l i and ( [TtI i yields, 

sgn(p) j P\ {r'yPdfi 

PX {qT'd^i ■ f q {r')-'d^Ji, 



> 



sgn(p) 



I.e., 



Ia{Px,R) > Ia{Px,Q)+Ia{Q,R) 
> Ia{Q,R)- 

|2|i We next prove statement |2]i. From Ia{Q, R) being finite, 
we claim that Ia{P, R) and Ia{S,R) are also finite. From 



( fT3] l, it is clear that p/q < \ ^ 
consequence, we have 



P 



and thus p/r < X ^q/r. As 



= P 
r 

< A 



IIpII 

r"\ 



Integrating with respect to R', we get 



dR' < X- 



m 
\\p\\ 



\\p\\ 



dR' < oo. 



Taking the sign of p appropriately, it immediately follows 
that Ia{P,R) < Ia{Q,R)+ finite constant, and is therefore 
finite. Similarly Ia{S, R) is also finite. Applying the first part 
of the theorem, we get 



IaiP,R) 
Ia{S,R) 



> 
> 



US,Q) 



Ic.iQ,R) 

Ia{Q,R). 



If either of these were a strict inequality, then the linear 
combination Q = AP + (1 - X)S will satisfy ([T7]i with 
strict inequality, a contradiction. So both the above must be 
equalities proving the "only if" part. The converse "if" part 
trivially follows from ([T). ■ 



